Introduction

Deep learning is a great approach to deal with unstructured data such as text, sound, video and image. There are a lot of implementations of deep learning in image classification and image detection, such as classifying image of dog or cats, detecting different objects in an image or do facial recognition.

knitr::include_graphics("https://cis-8392-assignment-3.s3.amazonaws.com/intro_image.gif")

On this article, we will try to build a simple image classification that will classify whether the presented image is an airplane, car, cat, dog, flower, fruit, motorbike or a person.

Natural Images

This dataset contains 6,899 images from 8 distinct classes compiled from various sources. The classes include airplane, car, cat, dog, flower, fruit, motorbike and person. The dataset is available at Kaggle.

The link to the dataset is https://www.kaggle.com/datasets/prasunroy/natural-images?datasetId=42780&language=null

Screenshot

Screenshot showing code section of dataset filtered by language R:

I have created an S3 storage bucket in AWS Cloud and hosted my screenshot image in the bucket along with giving public access to the object so that everyone with the url can have access to view the image.

url <- "https://cis-8392-assignment-3.s3.amazonaws.com/Screenshot+of+the+Code+section+of+the+Kaggle+dataset+filtered+with+R.PNG"
knitr::include_url(url)

Code

Loading the libraries required in the project:
library(keras)
library(tensorflow)
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.3     v dplyr   1.0.7
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   2.0.0     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(imager)
## Warning: package 'imager' was built under R version 4.1.3
## Loading required package: magrittr
## 
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
## 
##     set_names
## The following object is masked from 'package:tidyr':
## 
##     extract
## 
## Attaching package: 'imager'
## The following object is masked from 'package:magrittr':
## 
##     add
## The following object is masked from 'package:stringr':
## 
##     boundary
## The following object is masked from 'package:tidyr':
## 
##     fill
## The following objects are masked from 'package:stats':
## 
##     convolve, spectrum
## The following object is masked from 'package:graphics':
## 
##     frame
## The following object is masked from 'package:base':
## 
##     save.image
library(caret)
## Warning: package 'caret' was built under R version 4.1.3
## Loading required package: lattice
## 
## Attaching package: 'caret'
## The following object is masked from 'package:purrr':
## 
##     lift
## The following object is masked from 'package:tensorflow':
## 
##     train
library(grid)
## 
## Attaching package: 'grid'
## The following object is masked from 'package:imager':
## 
##     depth
library(gridExtra)
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine

Exploratory Data Analysis

Let’s explore the data first before building the model. In image classification problem, it is a common practice to put each image on separate folders based on the target class/labels. For example, inside the train folder in our data, you can that we have 7 different folders, respectively for airplane, car, cat, dog, flower, fruit, motorbike, person.

The below operations are being performed here:

## [1] "airplane"  "car"       "cat"       "dog"       "flower"    "fruit"    
## [7] "motorbike" "person"

## [[1]]
## Image. Width: 100 pix Height: 100 pix Depth: 1 Colour channels: 3 
## 
## [[2]]
## Image. Width: 188 pix Height: 121 pix Depth: 1 Colour channels: 3 
## 
## [[3]]
## Image. Width: 100 pix Height: 100 pix Depth: 1 Colour channels: 3 
## 
## [[4]]
## Image. Width: 114 pix Height: 111 pix Depth: 1 Colour channels: 3 
## 
## [[5]]
## Image. Width: 416 pix Height: 251 pix Depth: 1 Colour channels: 3 
## 
## [[6]]
## Image. Width: 348 pix Height: 429 pix Depth: 1 Colour channels: 3 
## 
## [[7]]
## Image. Width: 327 pix Height: 203 pix Depth: 1 Colour channels: 3 
## 
## [[8]]
## Image. Width: 100 pix Height: 100 pix Depth: 1 Colour channels: 3 
## 
## [[9]]
## Image. Width: 100 pix Height: 100 pix Depth: 1 Colour channels: 3 
## 
## [[10]]
## Image. Width: 256 pix Height: 256 pix Depth: 1 Colour channels: 3 
## 
## [[11]]
## Image. Width: 125 pix Height: 163 pix Depth: 1 Colour channels: 3 
## 
## [[12]]
## Image. Width: 100 pix Height: 100 pix Depth: 1 Colour channels: 3

For File Name

file_name <- map(folder_path, function(x) paste0(x, list.files(x))
) %>%  unlist()

Showing the file names and number of total training images

head(file_name)
## [1] "natural_images_small/train/airplane/airplane_0001.jpg"
## [2] "natural_images_small/train/airplane/airplane_0002.jpg"
## [3] "natural_images_small/train/airplane/airplane_0003.jpg"
## [4] "natural_images_small/train/airplane/airplane_0004.jpg"
## [5] "natural_images_small/train/airplane/airplane_0005.jpg"
## [6] "natural_images_small/train/airplane/airplane_0006.jpg"
tail(file_name)
## [1] "natural_images_small/train/person/person_0555.jpg"
## [2] "natural_images_small/train/person/person_0556.jpg"
## [3] "natural_images_small/train/person/person_0557.jpg"
## [4] "natural_images_small/train/person/person_0558.jpg"
## [5] "natural_images_small/train/person/person_0559.jpg"
## [6] "natural_images_small/train/person/person_0560.jpg"
length(file_name)
## [1] 4480

Check Image Dimension

One of important aspects of image classification is understand the dimension of the input images. You need to know the distribution of the image dimension to create a proper input dimension for building the deep learning model. Let’s check the properties of the first image.

# Full Image Description
img <- load.image(file_name[1])
img
## Image. Width: 286 pix Height: 113 pix Depth: 1 Colour channels: 3

You can get the information about the dimension of the image. The height and width represent the height and width of the image in pixels. The color channel represent if the color is in grayscale format (color channels = 1) or is in RGB format (color channels = 3). To get the value of each dimension, we can use the dim() function. It will return the height, width, depth, and the channels.

# Image Dimension
dim(img)
## [1] 286 113   1   3

So we have successfully inserted an image and got the image dimensions. On the following code, we will create a function that will instantly get the height and width of an image.

# Function for acquiring width and height of an image
get_dim <- function(x){
  img <- load.image(x) 
  
  df_img <- data.frame(height = height(img),
                       width = width(img),
                       filename = x
                       )
  
  return(df_img)
}
get_dim(file_name[1])
##   height width                                              filename
## 1    113   286 natural_images_small/train/airplane/airplane_0001.jpg

Now we will sampling 1,000 images from the file name and get the height and width of the image. We use sampling here because it will take a quite long time to load all images.

# Randomly get 1000 sample images
set.seed(123)
sample_file <- sample(file_name, 1000)
# Run the get_dim() function for each image
file_dim <- map_df(sample_file, get_dim)
head(file_dim, 10)
##    height width                                                filename
## 1     237   261       natural_images_small/train/flower/flower_0223.jpg
## 2     281   328       natural_images_small/train/flower/flower_0271.jpg
## 3     319   358             natural_images_small/train/dog/dog_0547.jpg
## 4      85   287   natural_images_small/train/airplane/airplane_0526.jpg
## 5     256   256       natural_images_small/train/person/person_0371.jpg
## 6     100   100         natural_images_small/train/fruit/fruit_0186.jpg
## 7     308   238             natural_images_small/train/dog/dog_0162.jpg
## 8     372   458             natural_images_small/train/cat/cat_0022.jpg
## 9     122   201 natural_images_small/train/motorbike/motorbike_0011.jpg
## 10    100   201 natural_images_small/train/motorbike/motorbike_0086.jpg

Now let’s get the statistics for the image dimensions.

summary(file_dim)
##      height           width          filename        
##  Min.   :  54.0   Min.   :  67.0   Length:1000       
##  1st Qu.: 100.0   1st Qu.: 100.0   Class :character  
##  Median : 125.0   Median : 237.5   Mode  :character  
##  Mean   : 201.4   Mean   : 240.9                     
##  3rd Qu.: 256.0   3rd Qu.: 297.0                     
##  Max.   :2665.0   Max.   :2737.0

The image data has a great variation in the dimension. Some images have less than 55 pixels in height and width while others have up to 2737 pixels. Understanding the dimension of the image will help us on the next part of the process: data preprocessing.

Rescaling the pixel values and normalizing

The images are rescaled and normalized the pixel values to range from 0 to 1.

In this way, the numbers will be small and the computation becomes easier and faster.

As the pixel values range from 0 to 256, apart from 0 the range is 255. So dividing all the values by 255 will convert it to range from 0 to 1.

train_datagen <- image_data_generator(rescale = 1/255)
validation_datagen <- image_data_generator(rescale = 1/255)
test_datagen <- image_data_generator(rescale = 1/255)

Data Preprocessing

Data preprocessing for image is pretty simple and can be done in a single step in the following section.

Since we have a good of training set, we don’t need to build artificial data using method called Data Augmentation.

Data augmentation is one useful technique in building models that can increase the size of the training set without acquiring new images. But, here we are not using Data Augmentation as we have enough data for building and training the deep learning convnet model.

Here we are reading the images and converting them to tensors while rescaling the pixel values to [0,1] interval.

Creating Train Generator

Now we can insert our image data into the generator using the flow_images_from_directory(). The data is located inside the natural_images_small folder and inside the train folder, so the directory will be natural_images_small/train. From this process, we will get the image and we can do this on both training data and the validation data.

train_generator <- flow_images_from_directory(
  "natural_images_small/train", # Target directory
  train_datagen, # Training data generator
  target_size = c(150, 150), # Resizes all images to 150 × 150
  batch_size = 20, # 20 samples in one batch
  class_mode = "categorical" # Because we use categorical_crossentropy loss,
  # we need categorical labels.
)

Creating Validation Generator

validation_generator <- flow_images_from_directory(
  "natural_images_small/validation",
  validation_datagen,
  target_size = c(150, 150),
  batch_size = 20,
  class_mode = "categorical"
)

Creating Test Generator

test_generator <- flow_images_from_directory(
  "natural_images_small/train",
  test_datagen,
  target_size = c(150, 150),
  batch_size = 20,
  class_mode = "categorical"
)

Model Architecture

Convnet model has been built for the deep learning classification.

Model is built already in R script and saved so that it can be loaded again.

Now we are loading the saved model in R Markdown

Model Fitting

model_file = "natural_images_model.h5"
history_file = "natural_images_fit_history.rds"
model_v2 <- load_model_hdf5(model_file)
history_v2 <- read_rds(history_file)

Visualising and Evaluating the test data

Plotting the loss and accuracy for training and validation data.

#Plotting the accuracy and loss for training and validation data

plot(history_v2)
## `geom_smooth()` using formula 'y ~ x'

#Evaluating the model on test data

model_v2 %>%
  evaluate_generator(test_generator, steps = 50)
## $loss
## [1] 0.06363041
## 
## $acc
## [1] 0.982

The accuracy on the test data on the model is around 98%.

Conclusion:

1. I have built the convnet model for predicting 8 classes from the natural images dataset.

2. From the visulaization of loss and accuracy with regards to epoch , I observed that accurary of the model will improve with more epochs .

3. I could observe over fitting issue based on the loss and epoch variation, and by using data augmentation technique we can overcome the overfitting.

4. With around 30 epochs, the model gave an accuracy of around 98% on the testing data.